AITopics | barlow twin

Collaborating Authors

barlow twin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

fb575ab4d882a4c734641155a5f30911-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 01:14:02 GMT

dimension, dimensional confounder, metamask, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Neural Diversity Regularizes Hallucinations in Language Models

Chakrabarti, Kushal, Balachundhar, Nirmal

arXiv.org Artificial IntelligenceDec-11-2025

Language models continue to hallucinate despite increases in parameters, compute, and data. We propose neural diversity -- decorrelated parallel representations -- as a principled mechanism that reduces hallucination rates at fixed parameter and data budgets. While existing mitigation strategies largely target accuracy, we provide the first formal tail bounds for hallucination probability in ensembled language models, reframing it as a second-moment reliability problem and explaining 94.3% of empirical reliability variation seen across parallel configurations. We introduce ND-LoRA (Neural Diversity Low-Rank Adaptation), combining parallel LoRA adapters with Barlow Twins regularization, and reduce hallucinations by up to 25.6% (and 14.6% on average) while preserving general accuracy. Ablations show LoRA adapters and regularization act synergistically, causal interventions prove neurodiversity as the mediating factor and correlational studies indicate scale: a 0.1% neural correlation increase is associated with a 3.8% hallucination increase. Finally, task-dependent optimality emerges: different tasks require different optimal amounts of neurodiversity. Together, our results highlight neural diversity as a third axis of scaling -- orthogonal to parameters and data -- to improve the reliability of language models at fixed budgets.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.2069

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Pre-train to Gain: Robust Learning Without Clean Labels

Szczecina, David, Pellegrino, Nicholas, Fieguth, Paul

arXiv.org Artificial IntelligenceNov-27-2025

Training deep networks with noisy labels leads to poor generalization and degraded accuracy due to overfitting to label noise. Existing approaches for learning with noisy labels often rely on the availability of a clean subset of data. By pre-training a feature extractor backbone without labels using self-supervised learning (SSL), followed by standard supervised training on the noisy dataset, we can train a more noise robust model without requiring a subset with clean labels. We evaluate the use of SimCLR and Barlow~Twins as SSL methods on CIFAR-10 and CIFAR-100 under synthetic and real world noise. Across all noise rates, self-supervised pre-training consistently improves classification accuracy and enhances downstream label-error detection (F1 and Balanced Accuracy). The performance gap widens as the noise rate increases, demonstrating improved robustness. Notably, our approach achieves comparable results to ImageNet pre-trained models at low noise levels, while substantially outperforming them under high noise conditions.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2511.20844

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

On the Optimal Representation Efficiency of Barlow Twins: An Information-Geometric Interpretation

Zhang, Di

arXiv.org Machine LearningOct-14-2025

Self-supervised learning (SSL) has emerged as a dominant paradigm for learning representations from unlabeled data [5]. Among various SSL approaches, methods based on redundancy reduction, such as Barlow Twins [7], have demonstrated exceptional performance. These methods operate on the principle of making the cross-correlation matrix between two distorted views of the data close to the identity matrix. While empirically successful, a deep theoretical explanation of why this objective leads to high-quality representations is still developing. A key desirable property of a good representation space is efficiency--the degree to which it utilizes its available dimensions to capture semantically meaningful, non-redundant information. An inefficient representation might suffer from dimensional collapse [4], where many dimensions are redundant or encode correlated information, limiting the representation's expressivity and suitability for downstream tasks. In this paper, we address this gap by proposing a novel information-geometric framework [1] for quantifying representation efficiency. Our core contributions are threefold: 1. We formally define the statistical manifold of representations and introduce a measure of representation efficiency η based on the spectrum of the average Fisher Information Matrix (FIM).

artificial intelligence, machine learning, representation, (12 more...)

arXiv.org Machine Learning

2510.1098

Country: Asia > China (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Contrastive Self-Supervised Learning at the Edge: An Energy Perspective

Famá, Fernanda, Pereira, Roberto, Kalalas, Charalampos, Dini, Paolo, Qendro, Lorena, Kawsar, Fahim, Malekzadeh, Mohammad

arXiv.org Artificial IntelligenceOct-10-2025

Abstract--While contrastive learning (CL) shows considerable promise in self-supervised representation learning, its deployment on resource-constrained devices remains largely underexplored. The substantial computational demands required for training conventional CL frameworks pose a set of challenges, particularly in terms of energy consumption, data availability, and memory usage. We conduct an evaluation of four widely used CL frameworks: SimCLR, MoCo, SimSiam, and Barlow Twins. We focus on the practical feasibility of these CL frameworks for edge and fog deployment, and introduce a systematic benchmarking strategy that includes energy profiling and reduced training data conditions. Our findings reveal that SimCLR, contrary to its perceived computational cost, demonstrates the lowest energy consumption across various data regimes. Finally, we also extend our analysis by evaluating lightweight neural architectures when paired with CL frameworks. Our study aims to provide insights into the resource implications of deploying CL in edge/fog environments with limited processing capabilities and opens several research directions for its future optimization. Over the years, a variety of contrastive learning (CL) approaches have been developed, including popular frameworks such as SimCLR [1], MoCo [2], BYOL [3], SimSiam [4], and Barlow Twins [5], each offering specific advantages and trade-offs. These frameworks aim to learn representations by distinguishing between similar (positive) and dissimilar (negative) samples in a latent space. While some methods rely on large negative sample sets to achieve high-quality representations, others bypass the need for negative pairs through momentum encoders or predictor networks.

artificial intelligence, energy consumption, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.08374

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology (0.93)
Health & Medicine (0.68)
Energy (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DinoTwins: Combining DINO and Barlow Twins for Robust, Label-Efficient Vision Transformers

Podsiadly, Michael, Lay, Brendon K

arXiv.org Artificial IntelligenceAug-26-2025

Training AI models to understand images without costly labeled data remains a challenge. We combine two techniques--DINO (teacher-student learning) and Barlow Twins (redundancy reduction)--to create a model that learns better with fewer labels and less compute. While both DINO and Barlow Twins have independently demonstrated strong performance in self-supervised learning, each comes with limitations--DINO may be sensitive to certain augmentations, and Barlow Twins often requires batch sizes too large to fit on consumer hardware. By combining the redundancy-reduction objective of Barlow Twins with the self-distillation strategy of DINO, we aim to leverage their complementary strengths. We train a hybrid model on the MS COCO dataset using only 10\% of labeled data for linear probing, and evaluate its performance against standalone DINO and Barlow Twins implementations. Preliminary results show that the combined approach achieves comparable loss and classification accuracy to DINO while maintaining strong feature representations. Attention visualizations further suggest improved semantic segmentation capability in the hybrid model. This combined method offers a scalable, label-efficient alternative for training ViTs in resource-constrained environments.

artificial intelligence, barlow twin, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.17509

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

fb575ab4d882a4c734641155a5f30911-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 21:38:36 GMT

artificial intelligence, machine learning, metamask, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Enhancing User Sequence Modeling through Barlow Twins-based Self-Supervised Learning

Liu, Yuhan, Ning, Lin, Wu, Neo, Singhal, Karan, Mansfield, Philip Andrew, Berlowitz, Devora, Prakash, Sushant, Green, Bradley

arXiv.org Artificial IntelligenceMay-5-2025

User sequence modeling is crucial for modern large-scale recommendation systems, as it enables the extraction of informative representations of users and items from their historical interactions. These user representations are widely used for a variety of downstream tasks to enhance users' online experience. A key challenge for learning these representations is the lack of labeled training data. While self-supervised learning (SSL) methods have emerged as a promising solution for learning representations from unlabeled data, many existing approaches rely on extensive negative sampling, which can be computationally expensive and may not always be feasible in real-world scenario. In this work, we propose an adaptation of Barlow Twins, a state-of-the-art SSL methods, to user sequence modeling by incorporating suitable augmentation methods. Our approach aims to mitigate the need for large negative sample batches, enabling effective representation learning with smaller batch sizes and limited labeled data. We evaluate our method on the MovieLens-1M, MovieLens-20M, and Yelp datasets, demonstrating that our method consistently outperforms the widely-used dual encoder model across three downstream tasks, achieving an 8%-20% improvement in accuracy. Our findings underscore the effectiveness of our approach in extracting valuable sequence-level information for user modeling, particularly in scenarios where labeled data is scarce and negative examples are limited.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.00953

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.92)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.67)

Add feedback

Supervised Pretraining for Material Property Prediction

Rahman, Chowdhury Mohammad Abid, Romero, Aldo H., Gyawali, Prashnna K.

arXiv.org Artificial IntelligenceApr-30-2025

Accurate prediction of material properties facilitates the discovery of novel materials with tailored functionalities. Deep learning models have recently shown superior accuracy and flexibility in capturing structure-property relationships. However, these models often rely on supervised learning, which requires large, well-annotated datasets an expensive and time-consuming process. Self-supervised learning (SSL) offers a promising alternative by pretraining on large, unlabeled datasets to develop foundation models that can be fine-tuned for material property prediction. In this work, we propose supervised pretraining, where available class information serves as surrogate labels to guide learning, even when downstream tasks involve unrelated material properties. We evaluate this strategy on two state-of-the-art SSL models and introduce a novel framework for supervised pretraining. To further enhance representation learning, we propose a graph-based augmentation technique that injects noise to improve robustness without structurally deforming material graphs. The resulting foundation models are fine-tuned for six challenging material property predictions, achieving significant performance gains over baselines, ranging from 2% to 6.67% improvement in mean absolute error (MAE) and establishing a new benchmark in material property prediction. This study represents the first exploration of supervised pertaining with surrogate labels in material property prediction, advancing methodology and application in the field.

artificial intelligence, machine learning, property prediction, (17 more...)

arXiv.org Artificial Intelligence

2504.20112

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Projection Head is Secretly an Information Bottleneck

Ouyang, Zhuo, Hu, Kaiwen, Zhang, Qi, Wang, Yifei, Wang, Yisen

arXiv.org Artificial IntelligenceMar-3-2025

Recently, contrastive learning has risen to be a promising paradigm for extracting meaningful data representations. Among various special designs, adding a projection head on top of the encoder during training and removing it for downstream tasks has proven to significantly enhance the performance of contrastive learning. However, despite its empirical success, the underlying mechanism of the projection head remains under-explored. In this paper, we develop an in-depth theoretical understanding of the projection head from the information-theoretic perspective. By establishing the theoretical guarantees on the downstream performance of the features before the projector, we reveal that an effective projector should act as an information bottleneck, filtering out the information irrelevant to the contrastive objective. Based on theoretical insights, we introduce modifications to projectors with training and structural regularizations. We believe our theoretical understanding on the role of the projection head will inspire more principled and advanced designs in this field. In recent years, contrastive learning has emerged as a promising representation learning paradigm and exhibited impressive performance without supervised labels (Chen et al., 2020; He et al., 2020; Zbontar et al., 2021). The core idea of contrastive learning is quite simple, that is to pull the augmented views of the same samples (i.e., positive samples) together while pushing the independent samples (i.e., negative samples) away. To improve the downstream performance of contrastive learning, researchers have proposed various special training objectives and architecture designs (Grill et al., 2020; Wang et al., 2021; Guo et al., 2023; Wang et al., 2023; 2024; Du et al., 2024). Among them, one of the most widely-used techniques is the projection head (i.e., projector) (Chen et al., 2020), which is a shallow layer following the backbone during pretraining and is discarded in downstream tasks like image classification and object detection. It has been shown that the features before the projector (denoted as encoder features) exhibit much better downstream performance than the features after the projector (denoted as projector features) across various applications (Jing et al., 2021; Gupta et al., 2022). Inspired by the success of the projection head in contrastive learning, researchers also extend this architecture to other representation learning paradigms and achieve significant improvements (Sariyildiz et al., 2022; Zhou et al., 2021). However, although the projection head has been widely adopted, the understanding of the underlying mechanism behind it is still quite limited. In this paper, we aim to establish a theoretical analysis of the relationship between the projection head and the downstream performance of contrastive learning.

downstream performance, information, projector, (14 more...)

arXiv.org Artificial Intelligence

2503.00507

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback